Setting: , is "smooth" in . Assume , positive definitive. (see Fisher information), and MLE is consistent: . Then if ,
Since we used to get , then we can use this for inference on !
1.1 Wald-Type Confidence Regions
Assume we have some estimator s.t. , then we plug in: if , then so by Slutsky's theorem This leads to test of : so .
So we reject iff . reject iff . .
For , we reject iff . reject iff .
More info smaller ellipse (shrinks like )
Options for :
Most obvious to plug in MLE for :
Observed Fisher information: .
Both have (under regularity; cts second derivative; MLE consistent).
Both make sense outside of iid setting: .
Heuristically, plug-in measures info about in "typical" data set but observe info measures info about in "this" data set.
Wald interval for :
Confidence ellipsoid: , . , .
More generally, if is any consistent estimator with and we have ( not necessarily MLE)
Example (Generalized linear model, fixed )
fixed. independent and , (carnonical form).
Let .
Most common examples: Logistic regression
Poisson regression:
Taylor expansion of :
Advantages and Disadvantages Advantages:
Easy to invert, simple confidence regions.
Asymptotically correct. Disadvantages:
Have to compute MLE.
Depends on parameterization.
Relies on two approximations:
Needs MLE to be consistent
Confidence interval/ellipsoid could go outside .
1.2 Score Test
Test : .
We can bypass quadratic approximation entirely by using score as test statistic
We could reject if Can do 1-sided tests.
No approx to , no MLE.
Don't need to estimate Fisher information at .
Can be generalized to case with nuisance parameters. Typically estimate via MLE on .
Score test is invariant to reparameterization: assume , , . . so if .
parameter exponential family
. So , so So
Pearson's test (goodness of fit)
Note , so this is a full-rank parameter exponential family. E.g.
So
Here we use So the score test of is
2 Generalized LRT
Test . Taylor expand around :
Test statistic: .
Consider , assume
, is a -dim manifold.
.
.
Likelihood is "smooth".
Then , where .
Why? Assume WLOG , . Then . And locally, near ,
3 Asymptotic Equivalence
Recall quadratic approximation picture ():
Then Wald: ; Score: .
4 Asymptotic Relative Efficiency (ARE)
Suppose are two asymptotic normal estimators of , with The ARE of w.r.t is . E.g. if then is 50% as efficient. Interpretation: suppose . Then for large , Using is like throwing away of the data and then using .